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ABSTRACT 

Measuring the two-point correlation function of the galaxies in the Universe gives access to the underlying dark matter 
distribution, which is related to cosmological parameters and to the physics of the primordial Universe. The estimation 
of the correlation function for current galaxy surveys makes use of the Landy-Szalay estimator, which is supposed to 
reach minimal variance. This is only true, however for a vanishing correlation function. We study the Landy-Szalay 
estimator when these conditions are not fulfilled and propose a new estimator that provides the smallest variance for 
a given survey geometry. Our estimator is a linear combination of ratios between pair-counts of data and/or random 
catalogues (DD, RR and DR). The optimal combination for a given geometry is determined by using log- normal mock 
catalogues. The resulting estimator is biased in a model dependent way but we propose a simple iterative procedure to 
obtain an unbiased model independent estimator. Using various sets of simulated data (log-normal, second-order LPT 
and N-Body), we obtain a 20-25% gain on the error bars on the two-point correlation function for the SDSS geometry 
and ACDM correlation function. When applied on to SDSS data (DR7 and DR9), we achieve a similar gain on the 
correlation functions which translates in a 10-15% improvement on the estimation of the densities of matter, Q m , and 
dark energy, Qa in open LCDM model. The constraints derived from DR7 data with our estimator are similar to those 
obtained with the DR9 data and the Landy-Szalay estimator which covers a volume twice larger and with a density 
three times higher. 
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Introduction 

The distribution of galaxies in the Universe is an extremely 
rich source of information for cosmology since galaxies trace 
the underlying dark matter field. Studies of galaxy cluster- 
ing at various redshifts therefore allow access to the expan- 
sion history of the Universe and, using models for this evo- 
lution, to cons train cosmology and funda mental physics [for 
example, see, |Liddle and Lyth, (2000)] . On larger scales 
where fluctuations are still small, one can apply linear the- 
ory and have a direct access to cosmological parameters. On 
smaller scales, gravity acts in a nonlinear manner and the 
galaxy clustering allows one to investigate the structuration 
of dark matter into halos. 

Observing the large-scale structure of the Universe 
is a promising approach for improving our understand- 
ing of its accelerated expansion observed by various cos- 
mological probes in the last decade. The cosmic ac- 
celeration was initially proposed to reconcile the ap- 
parent low matter content of the Universe with a 



flat g eometry in a standard Cold Dark Matter sce- 
nario Efstathiou et al., (1990) . The first convicing mea- 
surement of cosmic acceleration came from observations 
that type la supernovae appeared less luminous than 
expected in a decelerati ng Universe [Riess et al., (19 98) , 
Perlmutter et al., (1999) . These observations can be ac- 
comodated by modifying General Relativity on cosmo- 
logical scales or, within a Friedmann-Lemaitre-Robertson- 
Walker (FLRW) cosmology, by adding a Dark Energy 
component with a density Qx ~ 0.7, a negative pres- 
sure, and a possibly evolving equation of state. Since 
then, the cosmic acceleration has been confirmed by other 
probes with abundant data, inclu ding Cosmic Microwave 
Background (CMB) fl uctuations |Komatsu et al., (201 1)| 

Integrated Sachs-Wolfe (1SW) 
and Baryon ic Acoustic 



iSherwin et al., (2011 
effect [H ranett et al 



% Integ 
1772009) 



Oscillations (BAO) (| Weinberg et al (2012)| for a general 
review and |Anderson et al., (2012) | for the latest mea- 
surement). These data point towards a Dark Energy 
with a constant equation-of-state parameter, w = — 1, 
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or equivalently a pure cosmological constant. Baryonic 
Acoustic Oscillations measurements are based on the ob- 
servation of an acoustic peak in the correlation func- 
tion of the matter density fluctuations, corresponding to 
the acous tic horizon at the epoch o f matter-radiation de- 
coupling [Eisenstein and Hu, (1998)] . The acoustic scale is 
used as a standard ruler at various redshifts, allowing for 
the measurement of the angular distance in the transverse 
directions and the expansion rate in the radial direction 
Reid et al.;~2012| . 



When investigating the large-scale structure of the 
Universe using galaxies as a tracer of dark matter, one needs 
large-field-of-view deep galaxy surveys such as Sloan Digital 
Sky Survey (SDS S-III) Baryon Oscillati on Spectroscopic 
Survey (BOSS) |Eisenstein et al., (201 1)] ? i.e. high density 
galaxy catalogues, where the radial positions of galaxies 
are measured by their redshifts. The two-point correlation 
function is commonly used for characterizing the large scale 
structure within such galaxy surveys. The fact that one 
does not directly measure the density within the survey 
volume, but samples this density through galaxy locations, 
makes the estimation of the two-point correlation function 
more complex. The observed quantity is the average num- 
ber of neighbors at a given distance in the survey volume 
and is biased by the fact that galaxies near the edges of 
the catalogue volume have less neighbors than they should 
have, which needs to be corrected for in an optimal way. 
This issue does not occur, for example when directly mea- 
suring a function of the mat ter density through t he Lyman- 
a forest of distant quasars |Slosar et al., (2011)] . 

In this article, we introduce a novel estimator for the 
two-point correlation function of galaxies. Its performance 
can be optimized for a given galaxy survey geometry. In 
section [I] we motivate this effort, showing that various well- 
known estimators for the two-point correlation function 
have a bias and a variance that strongly depend on the 
sur vey geometry. The commonl y used Landy-Szalay estima- 



tor |Landy and Szalay, (1993)] has been shown to be both 
unbiased and of minimal variance in the limit of a van- 
ishing correlation function. We show that in realistic cases, 
where the correlation function is not zero, the Landy-Szalay 
estimator does not reach the Poisson noise limit. For ped- 
agogical reasons, we start in section [2] with a simpler but 
biased version of our optimal estimator and we develop in 
section [3] a simple iterative procedure that allows the final 
estimator to be model independent, with an improvement 
of the accuracy around 20-25% with respect to the Landy- 
Szalay estimator. In section [4] we apply our final estima- 
tor to data from the SDSS-II Seventh Data Release (DR7) 
Luminous Red Galaxy sample and on the SDSS-III/BOSS 
DR9 "CM ASS " sample and show the improvement on the 
two-point correlation function measurement and cosmolog- 
ical parameters with respect to previous analyses. 



Motivations for an optimized two-point 
correlation function estimator 



1.1. Commonly used estimators 



Estimators 
tion £(s) 
tion) have 



correlation func- 



of the two-point 
(s being the comoving separa- 
been studied by various authors 



Peebles and Hauser, (1974)1 Davis and Peebles, ( 1983) 



iewett, (1982)|Hamilton, (1993)|Landy and Szalay, (1993 



Generically, pair counts in data are compared to pair counts 
in random samples that follow the geometry of the survey. 
Let us assume a catalogue of n d objects in the data sample 
and n r in the random sample and calculate three sets of 
numbers of pairs as a function of the binned comoving 
separation s^j 

— within the data sample, leading to dd{s) that can be 
normalized to the total number of pairs as: 

DD(s) = M{S \, . 
V ; n d (n d - l)/2 

— within the random sample, leading to rr(s) normalized 
as: 

v J n r (n r - l)/2 

— among both samples (cross correlation) leading to dr(s) 
normalized as: 

DR{s) = 

n r n d 

The most common estimators discussed in the literature 



are: 



DD 

~RR ~ 
DD 



Peebles and Hauser, (1974) 



DR 



RR 



Hewett, (1982) 



€dp(s) 
£h(s) = 
£ls(s) -- 



DD 

= ~dr ~ 1 

DD x RR 

DR 2 
DD - 2DR 



Davis and Peebles, (1983) 



Hamilton, (1993) 



RR 



RR 



Landy and Szalay, (1993) 



Some studies have compared the behavior of the 
different two-correlation- function estimators, mainly in 
the small-scale regime an d using smaller samples. In 
|Pons-Borderia et al. (1999)] 6 estimators were analyzed, 
including both the Hamilton and Landy-Szalay estima- 
tors, and the authors did not fi nd any ou tstanding 
winner among those estimators. In Kerscher (1999) and 



|Kerscher, Szapudi and Szalay (2000)] 9 estimators were 
considered and the estimators presenting the best proper- 
ties were the Landy-Szalay and Hamilton estimators. 

1.2. Relative performances of the common estimators 

To compare the performances of these estimators, we have 
used tw o sets of 120 mock cat alogues obtained from log- 
normal [Coles and Jones, (1991)| density field simulations 
containing about 271,000 galaxies in both a cube of 1 IrT 1 
Gpc size and a far more co mplex geometry corresp onding 
to the BOSS (DR9) survey [Anderson et al. (2012)] which 
contains roughly the same volume as the cube. In addition 
we used random catalogues with three as many galaxies 
than the mock catalogues for both geometries. The cosmol- 
ogy used for the lognormal fields is taken from the WMAP 



7 years analysis [Komatsu et al., (2011)^ 

Fig. [I] shows the correlation function obtained with the 
different estimators for the cubic and DR9 geometries. We 



1 The number of pairs can be spherically averaged in the 
simplest approach. Its dependance on the angle with respect 
to the line of sight can be considered in a more elaborated 
analysis, in order to account for the sensitivity to angular dis- 
tanc e in the transverse direction a nd H(z) in the radial one 
(see [Cabre and Gaztanaga, (2008)] for details). 
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Fig. 1. Input two-point correlation function (dotted black line) and reconstructed ones using the various estimators 
available in the literature (solid lines of various colors) for a cubic geometry (left) or a realistic (BOSS DR9) survey 
volume (right). The dashed lines of various colors represent the RMS of the corresponding estimators. The Hamilton and 
Landy-Szalay lines are exactly superposed as well as the Davis-Peebles and Hewett lines. (Coloured version of the figure 
is available online) 



clearly see differences between the performances of the es- 
timators in the cube and in the DR9, either their mean 
result (solid lines) or their root-mean-square errors, RMS 
(dashed lines). In the case of the DR9 geometry, the mean 
result obtained with the Peebles-Hauser, Davis- Peebles and 
Hewett estimators is more biased with respect to the the- 
ory at large scales. Landy-Szalay and Hamilton estimators 
are much less biased than the others in this more complex 
geometry. Examining at the RMS, all estimators have their 
accuracy degraded by the effects of geometry. Landy-Szalay 
and Hamilton again show best performances with the low- 
est variances in both geometries as expected. 



Landy and Szalay, (1993) accounting for the finite size of 
the random catalogue. It appears that, when the correla- 
tion function is not vanishing, the Landy-Szalay estimator 
does not reach the Poisson noise limit. This suggests that 
a better estimator can be found in the case of a nonvan- 
ishing correlation function and a more complicated survey 
geometry. 



1.3. Optima I ity of the Landy-Szalay estimator 

In the limit of an infinitely large random catalogue, 
for which the volume is much larger than the ob- 
served scales, and a vanishing two-point correlation 
function (uniform galaxy distribution), the Landy- 
Szalay estimator is known to be unbiased and of min- 
imal variance. It is therefore the most widely used 



Eisenstein et al., (2005 



in modern galaxy surveys [e.g., Eisenste in et al., (2j 
Percival et al, (2007)|Kazin et al., (2010)|Blake et al, (20 
Anderson et al., (2012)[ Sanchez et al., (2012) . In practice 



2. An optimized estimator 

2.1. General form and optimization criterion 

Our search for a better estimator started from the observa- 
tion that the commonly used estimators are linear combi- 
nations of ratios of pair counts, DD, DR and RR (here- 
after the s dependence is described by vectors), with the 
exception of the Hamilton estimator, which involves ratios 
of second order products of pair counts. We therefore in- 
vestigate an estimator which would be an optimal linear 
combination of all possible ratios Ri up to second order. 
Table □ summarizes the six ratios at first order and the 
twelve at second order. The generic optimal estimator can 
Yyjthen be expressed as: 



the volume of modern survey is sufficiently large and one 
can also produce a large enough random catalogue, but 
the correlation function to be measured is nonzero. So it is 
crucial to check bias and variance of estimators in case of 
realistic non zero correlation functions. 

Using additional lognormal simulations, we have inves- 
tigated the RMS of the Landy-Szalay estimator as a func- 
tion of the size of the random catalogue for both a zero 
correlation function and the one expected from the ACDM 
scenario. Fifty realizations were produced in both cases 
where a cubic geometry was used in order to be insensi- 
tive to the degradation due to the survey geometry. The 
resulting RMS are shown in Fig. [2j along with the ex- 
pectations for an optimal estimator (from equation 48 in 



€ ( c ) 
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CO 



(2) 



(1) 



The nineteen q coefficients are optimized in order to 
minimize the variance of the estimator for a given geometry. 
This optimization is done through a y 2 minimization using 
a large set of mock catalogues generated using lognormal 
fields, for which DD, DR and RR are stored, so that all 
the Ri terms can be calculated. The y 2 is minimized with 
respect to the vector of parameters c as: 



e [r («) 



LS 
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Fig. 2. RMS of the Landy-Szalay estimator for lognormal simulations in a cubic geometry and for either a zero two-point 
correlation function (left) or the AC DM expectation (right). The RMS is shown in different colors for various sizes of 
the Random sample relative to the data sample. A dotted line of the same color shows the expectation for an optimal 
estimator in each case. Finally, the ultimate limit, corresponding to an infinite size of the random sample, is shown as a 
black dashed line. (Coloured version of the figure is available online) 



Table 1. The nineteen ratios formed by using pair counts 
up to second order. 
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* opt 

where the j index stands for the j-th realization, £ is the 
vector of the values of the estimator in the comoving dis- 
tance s bins and £ t h the vector for the theoretical input cor- 
relation function. The quantity TVls is the covariance ma- 
trix of fluctuations of around the mean Landy-Szalay 
correlation function (£ LS ), the mean taken over the mock 
realizations. 

This approach will result in an estimator with a vari- 
ance at most as large as that of the Landy-Szalay estimator, 
but which might have a significant bias. This bias can be 
calculated and corrected for to an arbitrary precision for a 



given input correlation function (therefore for a given cos- 
mological model). However, this bias correction is model 
dependent and would only work perfectly when the input 
cosmology in the mock data matches the one to be mea- 
sured in the real data. In section [3] we propose a simple 
iterative method that allows efficient circumvention of this 
problem. 

2.2. Performances on simulations 

Using lognormal simulations, we produced 120 realizations 
of galaxy catalogues with a geom etry similar to that of the 
SDSS-III/BOSS (DR9) survey [Eisenstein et al., (2011)[ 
Anderson et al., (2Q12)| . The fiducial cosmology was de- 
fined by h = 0.7, = 0.27, Q A = 0.73, n b = 0.045, 
erg = 0.8 and n s = 1.0. For each realization, we generate 
a random catalogue with the same geometry and calculate 
DD, DR and RR for comoving separations between and 
200 h' 1 Mpc with bins of 4 h' 1 Mpc. We then calculate 
the Landy-Szalay estimator for each simulation £^ s , the 

average estimator (£ LS ), and its covariance matrix TVls, 
empirically, i.e., from the dispersion of the individual real- 
izations. 

We then have all the ingredients required to minimize 
the x 2 m Eq.|2]and obtain an optimal estimator, which was 
done by limiting the \ 2 to the region [40, 200] h~ x Mpc. 
The actual value of the coefficients q is actually not 
particularly meaningful for two reasons: it depends on the 
geometry of the survey and is therefore not "general"; 
in addition, the parameters are degenerate because the 
nineteen Ri terms are not independent. 

Fig. [3] compares our estimator to the Landy-Szalay es- 
timator. The residual with respect to the theoretical input 
correlation and the RMS are shown for both estimators. 
These RMS are just the square-roots of the diagonal el- 
ements of the estimator covariance matrix, calculated em- 
pirically from the individual realizations. The Landy-Szalay 
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Fig. 3. Residuals (solid) and RMS (dashed) of the Landy- 
Szalay estimator (blue) and of the estimator fitted to mini- 
mize the variance (red) as explained in the text. (Coloured 
version of the figure is available online) 
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Fig. 4. [Top panels] Covariance matrices multiplied by 
the square of the comoving distance. [Bottom panels] 
Correlation matrices for both the Landy-Szalay estimator 
(left panels) and the optimized estimator (right panels). 
(Coloured version of the figure is available online) 



2.3. Model dependance 

Fig. [3] and Fig. [4] show that by correcting the optimized 
estimator by its average bias, which can be known with ex- 
cellent accuracy by having a large number of mock realiza- 
tions, one can achieve a better accuracy on the correlation 
function than the Landy-Szalay estimator. Unfortunately, 
the bias exhibits a peak at the location of the BAO scale 
and therefore will be different in another cosmology: it is 
strongly model dependent. If one uses an estimator opti- 
mized with a set of simulations that assumed a cosmology 
different from the actual one, the peak position in the bias 
will be different from that in the data, resulting in a strong 
distortion of the peak shape after bias correction and in a 
shift in its location. This is illustrated in Fig. [5j which will 
be further discussed latter. Fortunately, one can eliminate 
the cosmology dependence of the fitting, as described in 
next section. 



3. Iterative optimized estimator 

To transform the optimized estimator into a model indepen- 
dent one, we investigated the possibility of iterating with 
an estimator that assumes the same cosmology as that de- 
rived from the data. Such a procedure could be quite time 
consuming, as one needs a large number of mock realiza- 
tions for a given cosmology to optimize the estimator for 
this cosmology. We have found a way to do this efficiently, 
limiting the number of simulations to a few times the initial 
one. 



3.1. Description of the method 

Our iterative procedure starts with a first calculation of 
the correlation function using the Landy-Szalay estima- 
tor. We then fit the resulting correlation function with 
a model that has considerable freedom on the general 
broadband shape, so that it is essentially sensitive to the 
loc ation of the acoustic p eak, as used in BOSS analy- 
sis [Anderson et al., (2Q12)| : 



Cdata(<§) = b 2 theory (as) + a 



02 
Q 2 



(3) 



Oi 

s 

where theory is the th eoretical linear model from 
Eise nstein and Hu, (1998) | b is the constant galaxy dark- 
matter bias factor and oqT cl\ and are nuisance parame- 
ters. 

From this fit, we obtain the first iteration of the dilation 
scale parameter, a, that characterizes the location of the 
peak: 



estimator is essentially unbiased while a significant bias is 
observed for our estimator, which, however, remains within 
the la range. In contrast, our optimized estimator appears 
to have smaller variances than the Landy-Szalay estimator 
in the region [40, 200] IrT 1 Mpc , where the fit was per- 
formed. Fig. [4] suggests that the covariance and correlation 
matrices for the Landy-Szalay and optimized estimators. 
The latter has a smaller covariance matrix and no extra 
correlation between the bins. 



a = 



Dv 



(4) 



where r s is the comoving sound horizon at decoupling; the 
subscript / means that the quantity is calculated using our 
fiducial cosmology, for which r s = 157.42 Mpc; Dy (z) is the 
sph erically averaged dis tance to redshift z and is defined 
by [Mehta et al., (2011)]: 



Dy(z)= (1 + Z) 



^ 2 D 2 A (z)cz 
H(z) 



1/3 



(5) 



The parameter a is unity if the actual cosmology 
matches the fiducial one. The result of this fit is a first 
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estimate of the cosmological model suggested by the data, 
labeled by ctQ. This is actually the result of the standard 
analysis with the Landy-Szalay estimator. 

We then perform a large number of realizations of mock 
catalogues with the same DR9 geometry as the data, for 
various values of a around olq. We use lognormal simula- 
tions that can be quickly generated. For this work we have 
simulated 120 realizations of 9 different cosmologies such 
that the dilation parameter a covers the range [0.96, 1.04] 
in steps of width 0.01. For each realization we used the same 
number of galaxies, about 271,000, and a random catalogue 
15 times larger. This step does not require more than a few 
days on desktop machines. 

For the set of 120 simulations corresponding to a given 
input a/e, one can find the coefficients by minimizing 
the x 2 defined in Eq. [2] The resulting correlation function, 
£j(ck), for the realization j is given by Eq. [I] We compute 
the average bias B^ of the correlation function with respect 
to the theory theory («fc) : 



- £theory(a/e) 



(6) 



The covariance matrix A/ opt (a/ c ) is obtained from the fluc- 
tuations of the same 120 realizations : 



N opt (a k ) = ( - l{c k )] fo(c fc ) - |(c fe )f } . • (7) 

Hereafter we redefine the process of applying the esti- 
mator corresponding to to a data sample by two steps: 

— use Eq. Ill with coefficients c& to calculate tdata( c ^) 

— add the bias of the estimator, B^. 

We can now proceed with the iterative procedure. Since 
the first iteration value, a = ao, is not exactly one of the 
nine available we apply the estimator corresponding to 
the closest two values of ckq, <^io and a^, to the data, and we 
interpolate between the two resulting correlation functions: 

€°P t (a ) = (l-t)€(ai o )+U(a hi ), (8) 

where t = (ao — <^io)/(<^hi — &\o)- Similarly, the covariance 
matrix can be written as a function of the two covariance 
matrices N opt (a\ ) and A / opt (ahi) as: 



N opt (a) = (1 - t) 2 N opt (a lo ) + t 2 N opt (a hi ) 
+ t(l - t) Copt(aq OJ Oihi) , 



(9) 



where C opt (a\ Q , a^) is the cross- covariance between 
| opt (a lo ) and £ opt (a hi ) given by: 



Copt(ai 



r pt (a lc ) 



xopt, . 
€ ("lo; 



r pt Ki)-r pt Ki) 



r pt Ko 
r pt (« l0 ) - 



-rVo 



l t 



(10) 

Finally, we fit the correlation function (Eq. |8| with the 
template (Eq. [3]) using the covariance matrix (Eq. [9| , which 
yields a new value ql\ at the second iteration. We then iter- 
ate until the estimated varies less than a given quantity 
(Aa = 0.0001) between two successive iterations. In prac- 
tice, convergence is achieved after a few iterations. 
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Fig. 5. Difference between the mean ^Measured and ai nput 
as a function of ai npu t for lognormal simulations, when ap- 
plying four estimators: the Landy-Szalay estimator (blue 
points), the Iterative Optimal Estimator (red points), and 
two no n- iterative estimators with a = 0.96 (yellow points) 
and 1.04 (green points). Dashed lines of the same colors, fit- 
ted to the points, emphasize the biases. (Coloured version 
of the figure is available online) 



3.2. Performance on simulations 

In this section, we investigate the properties of the iter- 
ative optimal estimator on mock catalogs. We start with 
the lognormal simulations used to optimize the estimator 
and show that we derive an estimator that is indeed in- 
dependent of the input cosmological model. We obtain an 
20-25% increase in the accuracy on the a parameter with 
these simulations. We also test our estimator on other simu- 
lations than the lognormal ones used to optimize it. These 
are more realistic simulations than the lognomal simula- 
tions; they were produced in the framework of the SDSS- 
III/BOSS galaxy clustering working groups. 



3.2.1. Lognormal mock data 

As an illustration, we first considered what happens when 
we do not use the iterative procedure. The nine different 
sets of 120 lognormal simulations provide nine different op- 
timal estimators, defined by Ck and B^. We choose two of 
them to apply to the nine sets of simulations without the it- 
erative procedure. We fit the resulting correlation function 
with the template of Eq. [3] to obtain the scale parameter, 
^Measured- In Fig.|5j (^Measured is averaged over the 120 sim- 
ulations with the same given ai nput , and its difference with 
c^input is plotted versus ainput- The non- iterative estimators 
do not recover ai nput . This result occurs because the peak- 
shaped (Fig. |3| bias correction, B(a,r) slightly shifts the 
peak position of the data to the left (a = 0.96) or to the 
right (a = 1.04). As emphasized by the linear fits in the 
figure, the bias increases with the difference a — ai nput . 

Fig. [5] also demonstrates what happens with the itera- 
tive optimal estimator. The iterative procedure indeed re- 
moves the bias, since the iterative optimal estimator ap- 
pears to be unbiased. The error bars in the figure are RMS 
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Fig. 6. The error on ^Measured for the iterative optimal (red 
points) and Landy-Szalay (blue points) estimators for the 
nine sets of realizations with different ai npnt . (Coloured ver- 
sion of the figure is available online) 



of the values of ^Measured for the 120 different simulations. 
The optimal estimator gives smaller RMS than the Landy- 
Szalay estimator. This result is confirmed in Fig. [6j which 
shows this RMS as a function of ai npu t- The gain obtained 
with the optimal estimator is ^22% relative to the Landy- 
Szalay estimator (this number is not a general one; the pre- 
cise value depends on the geometry of the survey) leading 
to similar improvement on subsequent cosmological con- 
straints. 



3.2.2. PTHalos and LasDamas Mock data 

The studies in the previous section were performed by ap- 
plying the iterative optimal estimator to the lognormal sim- 
ulations that were used to optimize the estimator. We re- 
peated the calculations using two other sets of mock cat- 
alogues that have ve ry similar geometries, b ased on the 
BOSS DR9 footprint [Anderson et al., (2012)[ . 

The first set is based on 2nd order 
Lagrangian Perturbation Theory matter field 
Scoccimaro and Sheth, (2002)1 and halo occ upation 

[Manera et al., (2012)] . A to- 
with h = 0.7, 



function named PTHalos 
tal of 610 realizations were produced 
tt m 0.274 Q A 0.726, Q b h 2 = 0.0224, a 8 = 0.8 and 
n s = 0.97. |5 As the fiducial cosmology used to compute 
the comoving distances of the galaxies is slightly different 
than the one used in mock catalogues, a (Eq. [4| is not 
expected to be 1 but 1.002. 

The second set is even more realistic; it uses N- 
body simulations, named Large Suite of Dark Matte r 
Simulations (LasDamas) McBride et al., in preparation ; 
developed within the SDSS I-II galaxy clustering working 
group for the DR7 LRG analysis. A total of 153 realiza- 
tions were produced assuming a flat ACDM cosmology with 
Q b 0.04, Q m 0.25, h 0.7, n s 1.0 and cr 8 = 0.8, for 
which a is expected to be 0.988. 

Fig. [7] shows the "pull" histogram of the correlation 
functions, i.e., the residuals of the correlation functions rel- 
ative to the average Landy-Szalay correlation function, nor- 
malized to the empirical RMS of the Landy-Szalay estima- 
tor, (£— (£ls})/&ls- By construction, the width of the pull 
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Fig. 7. "Pull" distribution of correlation functions mea- 
sured with PTHalos (top) and LasDamas (bottom) mocks 
in the range 40 < s < 200 h~ x Mpc with the Landy-Szalay 
(blue) and the Iterative Optimal estimator (red). The stan- 
dard deviation of the Gaussian fit shows a smaller scatter 
for the latter estimator. (Coloured version of the figure is 
available online) 



distribution for the Landy-Szalay estimator is close to 1 for 
both sets of mock data, while for the iterative optimal es- 
timator it is 0.80 (PTHalos) and 0.84 (LasDamas), which 
is similar to the 22% gain on the error bars obtained with 
the lognormal simulations. 

This result is confirmed by Fig. 8j which shows the co- 
variance and correlation matrices obtained with both esti- 
mators on the PTHalos mocks. The gain in the covariance 
matrix elements is obvious and is not mitigated by an in- 
crease of the off-diagonal terms in the correlation matrix. 
Fig. [9] shows the same information for the LasDamas sim- 
ulations. The matrices are noisier since we have a smaller 
number of realizations, but the improvement is visible, and 
again the correlation does not change. A small increase of 
the covariance matrix is present between 40 and 60 /i -1 
Mpc, which can be also seen in Fig. [3] However, these scales 
are much smaller than the scales of interest here (BAO 
peak) where we indeed see a clear reduction of the covari- 



For this work we used 598 of the 610 realizations available. 



Finally, Fig. 10 displays the improvement on the esti- 
mation of a obtained using the iterative optimal estimator. 
The scatter of ^Measured with the optimal estimator is re- 
duced relative to Landy-Szalay by 21% for PTHalos and 
17% for LasDamas mocks. These gains are consistent with 
the observed "pull" distribution (Fig. [7| and confirme the 
gain observed with the lognormal simulations. 
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Fig. 8. Covariance matrices times the square of the comov- 
ing distance (top panels) and correlation matrices (bottom 
panels) of PTHalos mock catalogues using Landy-Szalay 
(left panels) and the Iterative Optimal Estimator (right 
panels). (Coloured version of the figure is available online) 
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Fig. 9. Same as figure [s] for LasDamas mock catalogues. 
(Coloured version of the figure is available online) 

4. Application to real data 

4.1. Data description 

We apply our final estimator on two galaxy samples: 
t he SPSS I-II DR7 Lumin ous Red Galaxy sample (LRG) 



CMASS [|Pa^manabhan et al. (2012)], 



Eisenstein et al., (2001)1 and the SDSS-III/BOSS DR9 

Both surveys, SDSS-I-II and SDSS-III/BOSS use 
the same wide field, dedicated telescope, the 2.5 m- 
aperture Sloan Foundation T elescope at Apach e Point 
Observatory in New Mexico |Gunn, et al. (2006)| . Those 
surveys image d the sky at high latitude in the 
ugriz bands Fukugita et al. (1996) , using a mosaic 



|Gunn, et al ( 1998) 



described m 
alogue had 



The SDSS-I-II imaging survey is 
Abaz ajian et al., (2009) [ This galaxy cat- 
been built based on the prescription in 



Eisenstein et al, (2001) | selecting the most luminous galax 
ies since they are more massive and then more biased 
with respect to the dark-matter density field. More details 
about the construct ion of the catalogue can be found in 
Kazin et al., (2010) The galaxies in the 1RG sample have 
redshifts in the range 0.16 < z < 0.47 and a density of 
about 10~ 4 h 3 Mpc -3 . 

The BOSS imaging survey data are described 
the spectrograph design 
^ and 



Aihara et al. (2011)1 



performance m Smee 



S.A., et al., (2012) 
;el 



spectral data reduction s in |Schlegel et al (2012)| 

summar y of BOSS can 
The 



m 

and 
the 
and 



Bolton, A., et al. (2012) 
Dawson 



K., et al. (2012) 



SDSS 



be 
Data 



Ann et al. (2012) CMASS sample of galaxies is 



found m 
Release 9 

constructed using an extension of the selection algorithm 
of DR7 1RG sample in order to detect fainter and bluer 
massive galaxies lying in the redshift range 0.43 < z < 0.7. 
The final density of this sample is 3 x 10 _4 /i 3 Mpc -3 . A 
m ore detailed explanation of t he target selection is given 



Padmanabhan et al. 



(2012)| 



4.2. DR7 

We compared the iterative optimal estimator to the landy- 
Szalay estimator for the estimation of the spherically 
averaged 2-point correlation function of the SDSS DR7 



[Abazajian et al., (2009)] 1RG sample (Fig. [II]). In order 
to estimate the correlation function, we usea a random 
catalogue 15 times larger than the data sample. The co- 
efficients Ck and biases B^{r) for the iterative estimator 
were obtained using the nine sets of lognormal simulations 
as described in section 13.11 The covariance matrix of the 
data correlation function for both estimators comes from 
the 153 realizations of the lasDamas mocks. The left panel 
of Fig. 11 shows the resulting correlation function for both 
estimators. The error bars obtained for the iterative esti- 
mator are smaller than for the landy-Szalay estimator, but 
both curves are consistent with each other. 

The correlation functions were fitted as for the mock 
catalogues, using the template defined by Eq.[3] The result- 
ing values of ^Measured are compatible with unity and the 
error for the optimal estimator is lower by 31%, as shown 
in Table [2j This improvement is larger than the 17% im- 
provement on the mean error observed on mock data but 
it is consistent with the scatter of the errors (Fig. [l3j left). 

In Fig. [12] (left) we use both estimators to compare 
^Measured for the DR7 1RG sample (red points) and 
lasDamas realizations (black points). The DR7 measure- 
ment is well inside the lasDamas cloud, being very close to 
the mean. 

Another way to improve the measurement accuracy of 
the BAO peak is through the reconstruction technique, 
where galaxies are slightly displaced so that the density field 
is as it should be without non- linear structur e growth effects 
|Eisenstein et al., (2007)| . Xu et al., (2012)| used the recon- 
struction technique on the DR7 1RG sample. Before recon- 
struction they obtain a = 1.015±0.044; after reconstruction 
a = 1.012 ± 0.024, an improvement of 45%. Our estimator, 
with an 31% improvement, yields a = 1.006 ±0.018, consis- 
tent with the reconstruction result. This comparison shows 



8 



M. Vargas- Magana et al.: An optimized correlation function estimator for galaxy surveys 



Table 2. Values of a found with two different estimators 
of the correlation function for each sample. 



Sample 


Landy-Szalay 


It. Opt. Est. 


Gain 






CKopt 


% 


Mean LasDamas 


0.976 ± 0.035 


0.979 ±0.029 


17 


Mean PTHalos 


1.013 ±0.039 


1.011 ±0.031 


21 


DR7 


1.004 ±0.026 


1.006 ±0.018 


31 


DR9 


1.010 ±0.018 


1.009 ±0.013 


28 



Table 3. Improvement on cosmological parameters with 
the iterative optimal estimator. 



WMAP7± 








Gain 








Gain 


DR7 (LS) 


0.276 


± 


0.018 




0.727 


± 


0.017 




DR7 (It. Opt.) 


0.274 


± 


0.014 


22% 


0.729 


± 


0.014 


17% 


DR9 (LS) 


0.278 


± 


0.015 




0.725 


± 


0.015 




DR9 (It. Opt.) 


0.278 


± 


0.013 


13% 


0.725 


± 


0.015 


0% 



that it is possible to gain in accuracy in two independent 
ways and the combination of both methods is expected to 
provide even better constraints on cosmological parameters. 

4.3. DR9 

Following the same procedure as for the DR7 LRG sample, 
we computed the spherically averaged 2-point correlation 
function using the Landy-Szalay estimator and the iterative 
optimal estimator. The results as shown in the right panel of 
Fig. [TT] The corresponding values of a are given in Tab. |2j 

We see a clear improvement on the precision of the a 
measurement with respect to the Landy-Szalay one. The 
values are in agreement, but the iterative estimator gives 
us 28% more accurate result. 

As discussed for the DR7 data, ^Measured and its error 
for DR9 CM ASS data are consistent with the measurements 
with PTHalos mocks, as can be seen in Fig. [l2] (right) and 
13] (right) . 

The BOSS DR9 CMASS result [Anderson et al., (2012) 



using the correlation function only is a = 1.016 ± 0.017 
and a = 1.024 ± 0.016 before and after reconstruction 
respectively. However in the case of DR9, the improvement 
of 6% due to reconstruction is much lower that the one 
expected with the new iterative estimator. Meanwhile, this 
result is consistent with our values with both estimators 
(Tab. [2} well within 1-a. 

5. Cosmological constraints 

The improvement on cosmological parameter constraints 
using the iterative optimal estimator is illustrated in 
Fig. [14] These constraints are obtained using a Monte Carlo 
Markov Chain within an open ACDM cosmology using 
CMB data only. The chairf] was re-sampled with our BAO 
a constraints. The marginalized constraints on Q m and Q\ 
are given in Tab. [3] 

The overall gain on the cosmological parameters is be- 
tween 13 and 22% (except for ^ A for DR9). With the iter- 
ative optimal estimator applied on DR7 data, the accuracy 
on Q m and is comparable to that measured with the 
Landy-Szalay estimator applied to the DR9 sample, even 
though the DR9 has a density that is three times larger and 
twice the volume of DR7. 



6. Conclusions 

We have designed a new two-point correlation function es- 
timator, which is a linear combination of all possible ratios 



(up to second order) of pairs counts between data and ran- 
dom samples. The linear combination can be optimized to 
minimize the variance of the correlation function for a given 
geometry. We developed an iterative procedure to make this 
new estimator independent of the cosmology of the sim- 
ulated data used in its optimization. We have shown on 
lognormal, second-order perturbation theory and N-body 
simulations that the decrease in size of the correlation func- 
tion error bars is around 25%, relative to the well known 
Landy-Szalay estimator. The improvement is not mitigated 
by extra correlations in the covariance matrix of the two- 
point correlation function. 

This result is not contradictory with the fact that the 
Landy-Szalay estimator was shown to be of minimal vari- 
ance, since this is true only for a vanishing correlation func- 
tion and a simple geometry. Current galaxy surveys do mea- 
sure a nonzero correlation function even on large scales and 
they have quite complex geometry. 

Finally, we have applied our method to SDSS DR7 
and DR9 data, achieving an improvement of 10-15% on 
the value of the cosmological parameters Q m and ^a. We 
achieve a similar accuracy with our estimator on the DR7 
sample as with the Landy-Szalay estimator on the DR9 
sample. 

Our method can be easily applied to any dataset and 
requires modest extra CPU time, as any analysis anyway 
requires a large number of simulated catalogues to test for 
systematic effects and to estimate the covariance matrix. 

For future developments, we would use Principal 
Component analysis to identify the combination of ratios 
that contributes most to minimize the correlation function 
variance. The optimization could then be limited to the 
most relevant combinations. 

This method can be easily extended to the study of 
the anisotropic correlation function. The coefficients would 
be optimized to simultaneously minimize the variance 
of the monopole and quadrupole. That approach would 
produce better constraints on redshift space distortions 
|Kaiser (1987)] physical parameters and on the Alcock- 
Paczynski test |Alcock and Paczynski (1979)] . 

The optimizer! iterative estimator could be easily 
applied to mark correlation functions as well [e.g., 
Skibba et al. (2006) | [Martinez et al. (2Q1Q)| . 



3 The MCMC from WMAP7 is available at 
http: //lambda. gsfc .nasa.gov/. 
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Fig. 10. Histogram of ^Measured f° r the PTHalos (left) and LasDamas (right) realizations using the Landy-Szalay (blue) 
and the iterative optimal estimators (red). The average values over the realizations, shown in the legend, are represented 
as vertical dashed lines of the same color, and their error as horizontal bars. The expected value is shown as a black 
dot-dashed vertical line. (Coloured version of the figure is available online) 
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Fig. 11. Correlation functions obtained for DR7 LRG (left) and DR9 CMASS (right) data samples using the Landy- 
Szalay (blue points) and the iterative optimal estimator (red points). Their best fit is shown by solid lines and ^Measured 
is given in the legend together with the x 2 /d.o.f. and its probability. The covariance matrices used for these fits for each 
iteration of the fit are based upon the LasDamas and PTHalos mocks covariance matrices, respectively, also obtained 
using both estimators. The error bars are the square root of the diagonal elements of these matrices. (Coloured version 
of the figure is available online) 
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